A multidisciplinary ensemble algorithm for clustering heterogeneous datasets

نویسندگان

چکیده

Clustering is a commonly used method for exploring and analysing data where the primary objective to categorise observations into similar clusters. In recent decades, several algorithms methods have been developed clustered data. We notice that most of these techniques deterministically define cluster based on value attributes, distance, density homogenous single-featured datasets. However, definitions are not successful in adding clear semantic meaning clusters produced. Evolutionary operators statistical multidisciplinary may help generating meaningful Based this premise, we propose new evolutionary clustering algorithm (ECA*) social class ranking meta-heuristic stochastically heterogeneous multifeatured The ECA* integrated with recombinational operators, Levy flight optimisation, some techniques, such as quartiles percentiles, well Euclidean distance K-means algorithm. Experiments conducted evaluate against five conventional approaches: (KM), K-means++ (KM++), expectation maximisation (EM), learning vector quantisation (LVQ), genetic clustering++ (GENCLUST++). That end, 32 datasets examine their performance using internal external basic measures measure how sensitive features (cluster overlap, number clusters, dimensionality, structure, shape) form an operational framework. results indicate surpasses its counterpart terms ability find right Significantly, compared less properties mentioned above. Thus, order overall algorithms, from best performing worst performing, ECA*, EM, KM++, KM, LVQ, GENCLUST++. Meanwhile, rank 1.1 (where 1 represents 6 refers algorithm) dataset

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Mutually Supervised Ensemble Approach for Clustering Heterogeneous Datasets

We present an algorithm to address the problem of clustering two contextually related heterogeneous datasets that use different feature sets, but consist of non-disjoint sets of objects. The method is based on clustering the datasets individually and then combining the resulting clusters. The algorithm iteratively refines the two sets of clusters using a mutually supervised approach to maximize...

متن کامل

A Pre-Trained Ensemble Model for Breast Cancer Grade Detection Based on Small Datasets

Background and Purpose: Nowadays, breast cancer is reported as one of the most common cancers amongst women. Early detection of the cancer type is essential to aid in informing subsequent treatments. The newest proposed breast cancer detectors are based on deep learning. Most of these works focus on large-datasets and are not developed for small datasets. Although the large datasets might lead ...

متن کامل

A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble

An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...

متن کامل

Clustering Heterogeneous Semi-structured Social Science Datasets

Social scientists have begun to collect large datasets that are heterogeneous and semi-structured, but the ability to analyze such data has lagged behind its collection. We design a process to map such datasets to a numerical form, apply singular value decomposition clustering, and explore the impact of individual attributes or fields by overlaying visualizations of the clusters. This provides ...

متن کامل

A Selective Fuzzy Clustering Ensemble Algorithm

To improve the performance of clustering ensemble method, a selective fuzzy clustering ensemble algorithm is proposed. It mainly includes selection of clustering ensemble members and combination of clustering results. In the process of member selection, measure method is defined to select the better clustering members. Then some selected clustering members are viewed as hyper-graph in order to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neural Computing and Applications

سال: 2021

ISSN: ['0941-0643', '1433-3058']

DOI: https://doi.org/10.1007/s00521-020-05649-1